Multiple testing in high-dimensional linear regression
نویسنده
چکیده
In many real-world statistical problems, we observe a large number of potentially explanatory variables of which a majority may be irrelevant. For this type of problem, controlling the false discovery rate (FDR) guarantees that most of the discoveries are truly explanatory and thus replicable. In this talk, we propose a new method named SLOPE to control the FDR in sparse high-dimensional linear regression. This computationally efficient procedure works by regularizing the fitted coefficients according to their ranks: the higher the rank, the larger the penalty. This is analogous to the Benjamini-Hochberg procedure, which compares more significant p-values with more stringent thresholds. Whenever the columns of the design matrix are not strongly correlated, we show empirically that SLOPE obtains FDR control at a reasonable level while offering substantial power. Although SLOPE is developed from a multiple testing viewpoint, we show the surprising result that it achieves optimal squared errors under Gaussian random designs over a wide range of sparsity classes. An appealing feature is that SLOPE does not require any knowledge of the degree of sparsity. This adaptivity to unknown sparsity has to do with the FDR control, which strikes the right balance between bias and variance. The proof of this result presents several elements not found in the high-dimensional statistics literature.
منابع مشابه
EVALUATION OF CONCRETE COMPRESSIVE STRENGTH USING ARTIFICIAL NEURAL NETWORK AND MULTIPLE LINEAR REGRESSION MODELS
In the present study, two different data-driven models, artificial neural network (ANN) and multiple linear regression (MLR) models, have been developed to predict the 28 days compressive strength of concrete. Seven different parameters namely 3/4 mm sand, 3/8 mm sand, cement content, gravel, maximums size of aggregate, fineness modulus, and water-cement ratio were considered as input variables...
متن کاملTesting a single regression coefficient in high dimensional linear models.
In linear regression models with high dimensional data, the classical z-test (or t-test) for testing the significance of each single regression coefficient is no longer applicable. This is mainly because the number of covariates exceeds the sample size. In this paper, we propose a simple and novel alternative by introducing the Correlated Predictors Screening (CPS) method to control for predict...
متن کاملJoint Testing and False Discovery Rate Control in High-Dimensional Multivariate Regression
Multivariate regression with high-dimensional covariates has many applications in genomic 15 and genetic research, in which some covariates are expected to be associated with multiple responses. This paper considers joint testing for regression coefficients over multiple responses and develops simultaneous testing methods with false discovery rate control. The test statistic is based on inverse...
متن کاملTwo-Sample Tests for High-Dimensional Linear Regression with an Application to Detecting Interactions.
Motivated by applications in genomics, we consider in this paper global and multiple testing for the comparisons of two high-dimensional linear regression models. A procedure for testing the equality of the two regression vectors globally is proposed and shown to be particularly powerful against sparse alternatives. We then introduce a multiple testing procedure for identifying unequal coordina...
متن کاملRobust high-dimensional semiparametric regression using optimized differencing method applied to the vitamin B2 production data
Background and purpose: By evolving science, knowledge, and technology, we deal with high-dimensional data in which the number of predictors may considerably exceed the sample size. The main problems with high-dimensional data are the estimation of the coefficients and interpretation. For high-dimension problems, classical methods are not reliable because of a large number of predictor variable...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2016